The packages used in this project are: Rio: Chan et al. (2021) Readr: Wickham and Hester (2021) Haven: Wickham and Miller (2021)
On April 14th, 1912, on its maiden voyage, the Titanic struck an iceberg. Two hours and 40 minutes later, approximately 62% of the passengers perished. Prior research has attempted to determine the characteristics of those who survived the sinking of the Titanic compared to those who died in order to better which attributes were prioritized for determining a life and death situation during that era. The purpose of this study is to further explore the most popular characteristics–class, gender, and age–using descriptive statistics, data visualization, and predictive models (e.g., logistic regression and conditional inference classification trees). Logistic regression results indicate that all three demographic attributes are significant predictors of survival. However, classification tree results suggest that gender had the largest effect on survival, followed subsequently by class. Interestingly, age was only a significant differentiating attribute between males.
Work in brief statement about class system, passengers, and voyage.
dat %>%
group_by(class) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
adorn_totals() %>%
kable(caption = "Breakdown of Passengers by Class",
col.names = c("Class", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Class | Count | Percent |
|---|---|---|
| 1st Class | 324 | 24.64 |
| 2nd Class | 284 | 21.60 |
| 3rd Class | 707 | 53.76 |
| Total | 1315 | 100.00 |
The Titanic was a British cruise liner that featured the most advanced technology available in 1912 but collided with an iceberg just before midnight and sank in the 2-degree Celsius North Atlantic Ocean on its maiden voyage and killed over two-thirds of the passengers and crew onboard (Balakumar et al., 2019; Frey et al., 2011; Hall, 1986). There are many documented factors that contributed to so many of the passengers and crew perishing on the Titanic. There were not enough lifeboats on board. The Titanic included twenty lifeboats which was only enough room for 52% of passengers on board at that time (Frey et al., 2011; Hall, 1986; Symanzik et al., 2019) and a portion of lifeboats launched were not full (Frey et al., 2011; Symanzik et al., 2019). Those who did not get a seat in a lifeboat were sure to perish due to the frigid ocean temperature (Hall, 1986; Frey et al., 2011) and the lower probability of being saved as it’s reported that partially full lifeboats that were lowered made no attempt to save people from the water (Hall, 1986). In addition to the media created based on the Titanic disaster, there have also been datasets released which provide information about the passengers and crew on the Titanic including their sex, age, nationality, ticket fare, social class, ticket number, passenger or crewmember status, parent status, sibling status, spouse status, and port of embarkation (Balakumar et al., 2019). This has enabled researchers to determine who was on the Titanic and whether any of these variables were significant predictors of survival or death.
The Titanic took approximately two hours and forty minutes to sink to the bottom of the ocean which is a lengthier amount of time compared to some other maritime disasters such as the Lusitania which took only about 18 minutes to sink to the bottom of the ocean after being struck by a torpedo (Frey et al., 2011). It has been hypothesized that this longer amount of time between the Titanic being struck and sinking completely left room for social patterns to operate rather than more selfish interests as in the Lusitania where passengers may have felt more of a fight-or-flight response to more imminent danger (Frey et al., 2011). For example, evacuating women and children before men was a social norm and code of conduct in 1912 (Farag & Hassan, 2018). It has also been documented that Captain Edward Smith had shouted, “Women and children first” after the Titanic collided with the iceberg (Farag & Hassan, 2018). This length of time may have also given an advantage to the first-class passengers compared to the second- and third-class passengers, and to the second-class passengers compared to the third-class passengers, due to a higher likelihood of these passengers giving commands and the crewmembers listening, as well as their financial means to bargain with crewmembers (Frey et al., 2011). NOTE: Wealth Gap here - Inline text now optional
As shown in the graph below, ehen taking inflation rates into consideration, we see that the average price for a first-class cabin in 1912 was $150.00, which today would be $4,241.74…
ggplotly(fare_graph)
Frey et al. (2011) explain that lifeboats were stored closest to the first-class cabins, first-class passengers had more access to information about the disaster, and they were more likely to have a relationship with the officers who gave orders for loading lifeboats, all of which may have given first-class passengers an advantage in survival. Because the Titanic was perceived to be a British ship, it may have been the case that British passengers were favored by crewmembers on the Titanic and had a greater chance of survival compared to passengers who were not British.
Since the population of interest for this study is passengers who were aboard the Titanic during its sinking, passengers who disembarked at Cherbourg, Queenstown, and Southampton (n = 35) as well as crew members were excluded from the analyses. Missing data was handled through listwise deletion of two participants who did not have their ages recorded. Thus, the analytic sample consisted of 1,315 passengers. The table below shows a breakdown of the analytic sample by class affiliation and gender.
dat %>%
group_by(class) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
adorn_totals() %>%
kable(caption = "Breakdown of Passengers by Class",
col.names = c("Class", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Class | Count | Percent |
|---|---|---|
| 1st Class | 324 | 24.64 |
| 2nd Class | 284 | 21.60 |
| 3rd Class | 707 | 53.76 |
| Total | 1315 | 100.00 |
Participant ages ranged from 0-74 years (M = 31.42, SD = 13.92). The table below shown the distribution of ages by each class. The average age in first-class was substantially older than both second and third-class. This may suggest that the trip served a different purpose for that group of passengers, such as recreation and experience versus business travels and immigration (Hall, 1986).
dat %>%
group_by(class) %>%
summarize(avg_age = mean(age), std_age = sd(age), min_age = min(age),
max_age = max(age)) %>%
kable(caption = "Average Age by Class",
col.names = c("Class", "Average Age", "SD Age", "Min. Age", "Max. Age"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Class | Average Age | SD Age | Min. Age | Max. Age |
|---|---|---|---|---|
| 1st Class | 39.14 | 13.55 | 0 | 71 |
| 2nd Class | 30.01 | 13.90 | 0 | 71 |
| 3rd Class | 25.12 | 11.71 | 0 | 74 |
The table below shows the list of nationalities reported by the Titanic’s passengers. The majority of the passengers where English (22.43%), American (18.40%), and Irish (9.28%). The majority of first-class passengers were American (60.19%), whereas the majority of second-class passengers were English (51.06%). Third class passengers were the most diverse class, with the most popular nationalities being English (15.84%), Irish (14.85%), Swedish (12.73%), and Syrian/Lebanese (11.74%). The difference in nationalities were likely due to the large number of individuals in third-class who were immigrating to American (Hall, 1986).
dat %>%
filter(!is.na(nationality2)) %>%
group_by(nationality2) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
arrange(desc(percent)) %>%
kable(caption = "Breakdown of Passenger Nationalities",
col.names = c("Nationality", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_styling(fixed_thead = T, full_width = F, html_font = "Cambria", bootstrap_options = c("striped", "hover"))
| Nationality | Count | Percent |
|---|---|---|
| English | 295 | 22.43 |
| American | 242 | 18.40 |
| Irish | 122 | 9.28 |
| Other - Multiple | 108 | 8.21 |
| Swedish | 100 | 7.60 |
| Syrian/Lebanese | 85 | 6.46 |
| Finnish | 58 | 4.41 |
| Canadian | 37 | 2.81 |
| Bulgarian | 31 | 2.36 |
| Croatian | 28 | 2.13 |
| French | 26 | 1.98 |
| Norwegian | 26 | 1.98 |
| Belgian | 25 | 1.90 |
| Scottish | 17 | 1.29 |
| Channel Islander | 15 | 1.14 |
| Swiss | 13 | 0.99 |
| Danish | 10 | 0.76 |
| Italian | 9 | 0.68 |
| German | 8 | 0.61 |
| Spanish | 8 | 0.61 |
| Welsh | 8 | 0.61 |
| Polish | 6 | 0.46 |
| Bosnian | 4 | 0.30 |
| Hong Kongese | 4 | 0.30 |
| South African | 4 | 0.30 |
| Greek | 3 | 0.23 |
| Lithuanian | 3 | 0.23 |
| Uruguayan | 3 | 0.23 |
| Australian | 2 | 0.15 |
| Chinese | 2 | 0.15 |
| Portuguese | 2 | 0.15 |
| Slovenian | 2 | 0.15 |
| Austrian | 1 | 0.08 |
| Dutch | 1 | 0.08 |
| Egyptian | 1 | 0.08 |
| Haitian | 1 | 0.08 |
| Hungarian | 1 | 0.08 |
| Japanese | 1 | 0.08 |
| Latvian | 1 | 0.08 |
| Mexican | 1 | 0.08 |
| Turkish | 1 | 0.08 |
The primary outcome of interest was survival status, which was recorded as a dichotomous factor variable (lost or survived).
Independent variables included class (which serves as a proxy for socioeconomic status), gender, and age. Class was recorded as a three-level factor variable (first-class, second-class, and third-class), whereas gender was recorded as a dichotomous factor variable (female or male). Age (in years) was recorded as a continuous variable.
Data analysis was performed using RStudio: Integrated Development Environment for R (RStudio Team, 2021) version 4.1.1. Descriptive statistics were computed to describe the analytic sample as well as compare survival rates across demographic subgroups of interest. Density ridges were graphed in order to visualize survival rate differences for gender and class subgroups across age ranges. Next, a logistic regression model was estimated to examine whether the main effects of gender (reference group = female), class (reference group = first-class), and age were significant predictors of surviving the disaster. To assess how these groups interact to influence survival as well which variable was the most influential, a conditional classification tree was estimated. Conditional classification trees combine recursive partitioning and statistical inference. This type of classification tree uses a splitting criteria based on Bonferroni-corrected statistical significance testing, which minimizes biases often associated with traditional classification trees (Hothorn et al., 2006). Alpha was set at .95 for all multivariate analyses.
OVERALL…
dat %>%
group_by(survived) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
adorn_totals() %>%
kable(caption = "Overall Survival Outcomes",
col.names = c("Outcomes", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Outcomes | Count | Percent |
|---|---|---|
| Lost | 815 | 61.98 |
| Saved | 500 | 38.02 |
| Total | 1315 | 100.00 |
When examining the descriptive statistics broken down by class and gender, there are substantial disparities in survival. as shown in the table below, approximately 62% of first-class passengers survived, compared to 41.55% of second-class passengers and 74.47% of third-class passengers.
dat %>%
group_by(class, survived) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
arrange(class, survived) %>%
kable(caption = "Survival Rate by Class",
col.names = c("Class", "Survived", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Class | Survived | Count | Percent |
|---|---|---|---|
| 1st Class | Lost | 123 | 37.96 |
| 1st Class | Saved | 201 | 62.04 |
| 2nd Class | Lost | 166 | 58.45 |
| 2nd Class | Saved | 118 | 41.55 |
| 3rd Class | Lost | 526 | 74.40 |
| 3rd Class | Saved | 181 | 25.60 |
As shown in the table below, 72.75% of female passengers survived compared to 18.96% of male passengers.
dat %>%
group_by(gender, survived) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
arrange(gender, survived) %>%
kable(caption = "Survival Rate by Gender",
col.names = c("Gender", "Survived", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Gender | Survived | Count | Percent |
|---|---|---|---|
| Female | Lost | 127 | 27.25 |
| Female | Saved | 339 | 72.75 |
| Male | Lost | 688 | 81.04 |
| Male | Saved | 161 | 18.96 |
The table below shows Examining survival rates broken down by both class and gender. Only five female first-class female passengers lost their lives while 96.53% survived. 65.56% of first-class male passengers lost their lives while 34.44% survived. Among second-class female passengers, 11.32% perished and 88.68% survived. For second-class male passengers, 86.52% perished and 13.48% survived. 50.93% of third-class female passengers lost their lives while 49.07% survived. 84.79% of third-class male passengers lost their lives while 15.21% survived. These differences in rates highlight how class and gender may interact to predict survival.
dat %>%
group_by(class, gender, survived) %>%
summarize(count = n()) %>%
mutate(percent = (count/sum(count))*100) %>%
arrange(class, gender) %>%
kable(caption = "Survival Rate by Class and Gender",
col.names = c("Class", "Gender", "Survived", "Count", "Percent"),
digits = 2,
booktabs = TRUE) %>%
kable_classic(full_width = F, html_font = "Cambria")
| Class | Gender | Survived | Count | Percent |
|---|---|---|---|---|
| 1st Class | Female | Lost | 5 | 3.47 |
| 1st Class | Female | Saved | 139 | 96.53 |
| 1st Class | Male | Lost | 118 | 65.56 |
| 1st Class | Male | Saved | 62 | 34.44 |
| 2nd Class | Female | Lost | 12 | 11.32 |
| 2nd Class | Female | Saved | 94 | 88.68 |
| 2nd Class | Male | Lost | 154 | 86.52 |
| 2nd Class | Male | Saved | 24 | 13.48 |
| 3rd Class | Female | Lost | 110 | 50.93 |
| 3rd Class | Female | Saved | 106 | 49.07 |
| 3rd Class | Male | Lost | 416 | 84.73 |
| 3rd Class | Male | Saved | 75 | 15.27 |
Furthermore, age was also an important factor that contributed to survival. As shown in the figure below, age…Something here
surv_ageclass_hist
Results of the main effects logistic regression model predicting survival are shown in the table below. When controlling for the effects of gender and class, age was a significant predictor of survival (OR 0.97; 95% CI 0.95, 0.98; p<0.001). With each additional year in age, passengers’ odds of survival decreased by three percent. When controlling for the effects of age and gender, class affiliation was a significant predictor of survival (OR NA; 95% CI NA, NA; NA). Compared to first-class passengers, second-class passengers’ and third-class passengers’ odds of surviving the disaster were 73% lower and 90% lower, respectively. Gender was also a significant predictor of survival (OR NA; 95% CI NA, NA; NA), even when controlling for class and age. Male passengers faced 92% lower odds of survival compared to female passengers. Taken together, these results confirm that–even when controlling for one another–class, age, and gender significantly affected survival rates.
tbl_m1
| Characteristic | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| age | 0.97 | 0.95, 0.98 | <0.001 |
| gender | |||
| Female | — | — | |
| Male | 0.08 | 0.06, 0.11 | <0.001 |
| class | |||
| 1st Class | — | — | |
| 2nd Class | 0.27 | 0.18, 0.40 | <0.001 |
| 3rd Class | 0.10 | 0.07, 0.15 | <0.001 |
|
1
OR = Odds Ratio, CI = Confidence Interval
|
|||
The figure below shows the results of the conditional classification tree used to model survival. The tree’s terminal nodes identified the following eight subgroups: 1. First class females 2. Second class females 3. Third class females 4. First class males, 54 years of age or younger 5. First class males, older than 54 years of age 6. Second class males, nine years of age or younger 7. Third class males, nine years of age or younger 8. Second and third-class males, older than nine years of age The terminal nodes’ barplots indicate the breakdown of survival for each subgroup (black = survival, gray = loss of life). Each gender was stratified by class, suggesting that class was an important predictor of survival for both males and females. However, class had much smaller effect in women (p = .044) than men (p <.001). Female subgroups were not split by age, whereas all male subgroups were split by age following class, which indicates that age had a larger effect among males than females. Furthermore, the age split for first-class males (54 years of age) is substantially larger than the age split among second and third-class males (nine years of age), which aligns with the wider age distribution of first-class males previously observed in the density ridge graphs. Interestingly, second and third-class males over the age of nine were not split by class. When examining the model as a whole, the base node was gender (p < .001), suggesting it was the greatest predictor of survival entered into the model. Thus, based on these order of the tree splits, one can hypothesize that gender was the largest predictor of survival, followed by class and age, respectively.
plot(ctree, main = "Predicting Survival From Gender, Class, and Age")
NEEDS FINISHED